A Finite State Transducer (FST) based Font Converter

نویسندگان

  • Sriram Chaudhury
  • Shubhamay Sen
  • Gyanranjan Nandi
  • Steve Comstock
  • Akshar Bharati
  • Nisha Sangal
  • Vineet Chaitanya
  • Rajeev Sangal
  • Anand Arokia Raj
  • Himanshu Garg
  • Mikel L. Forcada
چکیده

This paper describes the rule based approach towards the development of an Oriya Font Converter that effectively converts the SAMBAD and AKRUTI proprietary font to standardize Unicode font. This can be very much helpful towards electronic storage of information in the native language itself, proper search and retrieval. Our approach mainly involves the Apertium machine translation tool that uses Finite State Transducers for conversion of symbolic data to standardized Unicode Oriya font. To do so it requires a map table mapping the commonly used Oriya syllables in Proprietary font to its corresponding font code and the dictionary specifying the rules for mapping the proprietary font code to Unicode font. Further some unhandled symbols that appear in the intermediate converted file are rectified by Flex scanner tool. The converted text thus obtained is in standard Unicode font and remains unchanged as Unicode font is supported by almost all the platforms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...

متن کامل

A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...

متن کامل

N-gram FST Indexing for Spoken Term Detection

An efficient indexing scheme is essentially important for spoken term detection (STD) on large databases, particularly for phone-based systems that have been widely adopted to achieve vocabulary-independent detection. While the finite state transducer (FST) composition provides a standard indexing approach, the n-gram reverse indexing is more flexible in connectivity representation and confiden...

متن کامل

Injection Structures Specified by Finite State Transducers

An injection structure A = (A, f) is a set A together with a one-place one-to-one function f . A is an FST injection structure if A is a regular set, that is, the set of words accepted by some finite automaton, and f is realized by a finite-state transducer. We initiate the study of FST injection structures. We show that the model checking problem for FST injection structures is undecidable whi...

متن کامل

Automated Essay Scoring Based on Finite State Transducer: towards ASR Transcription of Oral English Speech

Conventional Automated Essay Scoring (AES) measures may cause severe problems when directly applied in scoring Automatic Speech Recognition (ASR) transcription as they are error sensitive and unsuitable for the characteristic of ASR transcription. Therefore, we introduce a framework of Finite State Transducer (FST) to avoid the shortcomings. Compared with the Latent Semantic Analysis with Suppo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012